Predictive Maintenance for Centrifugal Pumps¶
Project Description¶
This project focuses on developing a Predictive Maintenance System for Centrifugal Pumps used in chemical industries. By leveraging machine learning algorithms and sensor data, the goal is to predict potential failures before they occur, optimizing maintenance schedules, minimizing downtime, and reducing operational costs.
Dataset Parameters¶
The dataset simulates operational data collected from centrifugal pumps. Key parameters include:
- Air Temperature [K]: Ambient temperature near the equipment.
- Process Temperature [K]: Temperature of the fluid being pumped.
- Rotational Speed [rpm]: Speed of the pump impeller.
- Torque [Nm]: Motor torque applied to drive the pump.
- Tool Wear [min]: Cumulative wear of critical components like bearings and impellers.
- Target: Indicator of pump failure.
- Failure Type: Categorized into: No Failure Power Failure Tool Wear Failure Overstrain Failure Random Failures Heat Dissipation Failure
What Are Centrifugal Pumps?¶
Centrifugal pumps are mechanical devices designed to move fluids by converting rotational kinetic energy from a motor into hydrodynamic energy. They operate based on centrifugal force, where the rotation of an impeller increases the fluid's velocity and pressure.
Uses in Chemical Industries¶
- Fluid Transfer: Transporting chemicals, solvents, and process liquids across different units.
- Reaction Processes: Circulating reactants in chemical reactors.
- Cooling Systems: Pumping cooling water in heat exchangers.
- Filtration Systems: Driving fluids through filtration units. Centrifugal pumps are indispensable in chemical manufacturing, ensuring smooth and efficient operations.
Predictive Maintenance¶
Predictive maintenance is a proactive strategy that uses data analysis tools and techniques to identify potential equipment failures before they occur. Unlike reactive or preventive maintenance, it optimizes maintenance schedules by predicting the actual condition of equipment.
How It Works:¶
- Data Collection: Sensors monitor critical parameters like speed, temperature, and torque.
- Data Analysis: Historical data is analyzed to identify patterns and anomalies.
- Machine Learning Models: Algorithms predict the likelihood of failures based on sensor data.
- Actionable Insights: Maintenance teams are alerted to repair or replace components proactively.
Why Is It Crucial and Beneficial?¶
- Reduces Downtime: Minimizes unexpected breakdowns.
- Cost-Effective: Prevents over-maintenance and reduces repair costs.
- Improves Safety: Avoids catastrophic failures that could endanger workers or the environment.
- Enhances Efficiency: Ensures optimal equipment performance.
Current Industry Practices¶
Industries are increasingly adopting machine learning for predictive maintenance. Tools like anomaly detection, time-series forecasting, and classification models are integrated with IoT-enabled systems to monitor and maintain equipment health.
Case Studies: Companies like GE and Siemens deploy AI-driven predictive maintenance solutions for pumps and compressors.
Real-Time Monitoring: Systems continuously monitor sensor data and predict failures using cloud-based platforms.
Scalable Solutions: Machine learning models adapt to different equipment and environments.\
Approach for the problem
so it's a classification problem we are here predicting whether a centrifugal pump is going to failure or not in certain condtions and parameters as we have to predict failure or no failure along with which type of failure it is we will develop two step model
first one will predict failure or no failure then second to determine what type of failure it is
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 8)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
df = pd.read_csv('predictive_maintenance.csv')
df
| UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | Failure Type | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | M14860 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | No Failure |
| 1 | 2 | L47181 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | No Failure |
| 2 | 3 | L47182 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | No Failure |
| 3 | 4 | L47183 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | No Failure |
| 4 | 5 | L47184 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | No Failure |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | 9996 | M24855 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | No Failure |
| 9996 | 9997 | H39410 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | No Failure |
| 9997 | 9998 | M24857 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | No Failure |
| 9998 | 9999 | H39412 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | No Failure |
| 9999 | 10000 | M24859 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | No Failure |
10000 rows × 10 columns
df_copy = df.copy()
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 UDI 10000 non-null int64 1 Product ID 10000 non-null object 2 Type 10000 non-null object 3 Air temperature [K] 10000 non-null float64 4 Process temperature [K] 10000 non-null float64 5 Rotational speed [rpm] 10000 non-null int64 6 Torque [Nm] 10000 non-null float64 7 Tool wear [min] 10000 non-null int64 8 Target 10000 non-null int64 9 Failure Type 10000 non-null object dtypes: float64(3), int64(4), object(3) memory usage: 781.4+ KB
- We don't have null values
- Three main data types
df.describe()
| UDI | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | |
|---|---|---|---|---|---|---|---|
| count | 10000.00000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 |
| mean | 5000.50000 | 300.004930 | 310.005560 | 1538.776100 | 39.986910 | 107.951000 | 0.033900 |
| std | 2886.89568 | 2.000259 | 1.483734 | 179.284096 | 9.968934 | 63.654147 | 0.180981 |
| min | 1.00000 | 295.300000 | 305.700000 | 1168.000000 | 3.800000 | 0.000000 | 0.000000 |
| 25% | 2500.75000 | 298.300000 | 308.800000 | 1423.000000 | 33.200000 | 53.000000 | 0.000000 |
| 50% | 5000.50000 | 300.100000 | 310.100000 | 1503.000000 | 40.100000 | 108.000000 | 0.000000 |
| 75% | 7500.25000 | 301.500000 | 311.100000 | 1612.000000 | 46.800000 | 162.000000 | 0.000000 |
| max | 10000.00000 | 304.500000 | 313.800000 | 2886.000000 | 76.600000 | 253.000000 | 1.000000 |
- The data looks good with resonable max and min values
Exploratory Analysis and Visualization¶
fig = px.histogram(df,x='Air temperature [K]',marginal='box',nbins = 100,title='Distrubtion of Air temperature [K]')
fig.update_layout(bargap=0.2)
fig.show()
fig = px.histogram(df,x='Process temperature [K]',marginal='box',nbins = 100,title='Distrubtion of Air temperature [K]',color_discrete_sequence=['green'])
fig.update_layout(bargap=0.2)
fig.show()
From above Plotings we can observe the max temp[K] noted by senrors
import plotly.express as px
df['Target'] = df['Target'].astype(int)
df['Target'] = df['Target'].astype(str)
fig = px.scatter(df,
x='Rotational speed [rpm]',
y='Torque [Nm]',
opacity=1,
title='RPM VS NM',
color='Target',
color_discrete_sequence=['blue','red'])
fig.update_traces(marker=dict(size=2))
fig.update_layout(width=1200, height=700)
fig.show()
df['Target'].unique()
array(['0', '1'], dtype=object)
import plotly.express as px
df['Target'] = df['Target'].astype(int)
df['Target'] = df['Target'].astype(str)
fig = px.scatter(df,
x='Process temperature [K]',
y='Torque [Nm]',
opacity=1,
title='Process temperature [K] VS Torque',
color='Target',
color_discrete_sequence=['blue','red'])
fig.update_traces(marker=dict(size=2))
fig.update_layout(width=1000, height=700)
fig.show()
import plotly.express as px
df['Target'] = df['Target'].astype(int)
df['Target'] = df['Target'].astype(str)
fig = px.scatter(df,
x='Air temperature [K]',
y='Torque [Nm]',
opacity=1,
title='Air temperature [K] VS Torque',
color='Target',
color_discrete_sequence=['blue','red'])
fig.update_traces(marker=dict(size=2))
fig.update_layout(width=1000, height=700)
fig.show()
import plotly.express as px
df['Target'] = df['Target'].astype(int)
# Convert 'Target' to categorical (optional but recommended)
df['Target'] = df['Target'].astype(str)
fig = px.scatter(df,
x='Rotational speed [rpm]',
y='Torque [Nm]',
opacity=1,
title='Rotational speed [rpm] VS Torque',
color='Failure Type',
)
# Set marker size
fig.update_traces(marker=dict(size=2))
# Set figure size
fig.update_layout(width=1000, height=700)
fig.show()
FEATURE ENGINEERING¶
df
| UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | Failure Type | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | M14860 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | No Failure |
| 1 | 2 | L47181 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | No Failure |
| 2 | 3 | L47182 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | No Failure |
| 3 | 4 | L47183 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | No Failure |
| 4 | 5 | L47184 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | No Failure |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | 9996 | M24855 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | No Failure |
| 9996 | 9997 | H39410 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | No Failure |
| 9997 | 9998 | M24857 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | No Failure |
| 9998 | 9999 | H39412 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | No Failure |
| 9999 | 10000 | M24859 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | No Failure |
10000 rows × 10 columns
As we are going to train the first model for target column we will drop failure type column
df['rolling_mean_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).mean()
df['rolling_std_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).std()
df['rolling_var_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).var()
df
| UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | Failure Type | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | M14860 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | No Failure | 1551.000000 | NaN | NaN |
| 1 | 2 | L47181 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | No Failure | 1479.500000 | 101.116270 | 10224.500000 |
| 2 | 3 | L47182 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | No Failure | 1485.666667 | 72.293384 | 5226.333333 |
| 3 | 4 | L47183 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | No Failure | 1472.500000 | 64.634872 | 4177.666667 |
| 4 | 5 | L47184 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | No Failure | 1459.600000 | 62.970628 | 3965.300000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | 9996 | M24855 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | No Failure | 1583.200000 | 131.944264 | 17409.288889 |
| 9996 | 9997 | H39410 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | No Failure | 1595.700000 | 129.827278 | 16855.122222 |
| 9997 | 9998 | M24857 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | No Failure | 1610.200000 | 125.991887 | 15873.955556 |
| 9998 | 9999 | H39412 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | No Failure | 1573.900000 | 126.805582 | 16079.655556 |
| 9999 | 10000 | M24859 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | No Failure | 1566.200000 | 128.916683 | 16619.511111 |
10000 rows × 13 columns
df['torque_trend'] = df['Torque [Nm]'].diff()
df['temperature_trend'] = df['Process temperature [K]'].diff()
df
| UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | Failure Type | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | M14860 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | No Failure | 1551.000000 | NaN | NaN | NaN | NaN |
| 1 | 2 | L47181 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | No Failure | 1479.500000 | 101.116270 | 10224.500000 | 3.5 | 0.1 |
| 2 | 3 | L47182 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | No Failure | 1485.666667 | 72.293384 | 5226.333333 | 3.1 | -0.2 |
| 3 | 4 | L47183 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | No Failure | 1472.500000 | 64.634872 | 4177.666667 | -9.9 | 0.1 |
| 4 | 5 | L47184 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | No Failure | 1459.600000 | 62.970628 | 3965.300000 | 0.5 | 0.1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | 9996 | M24855 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | No Failure | 1583.200000 | 131.944264 | 17409.288889 | 1.6 | 0.1 |
| 9996 | 9997 | H39410 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | No Failure | 1595.700000 | 129.827278 | 16855.122222 | 2.3 | 0.0 |
| 9997 | 9998 | M24857 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | No Failure | 1610.200000 | 125.991887 | 15873.955556 | 1.6 | 0.2 |
| 9998 | 9999 | H39412 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | No Failure | 1573.900000 | 126.805582 | 16079.655556 | 15.1 | 0.1 |
| 9999 | 10000 | M24859 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | No Failure | 1566.200000 | 128.916683 | 16619.511111 | -8.3 | 0.0 |
10000 rows × 15 columns
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Failure Type Encoded'] = le.fit_transform(df['Failure Type'])
dict(zip(le.classes_, le.transform(le.classes_)))
{'Heat Dissipation Failure': np.int64(0),
'No Failure': np.int64(1),
'Overstrain Failure': np.int64(2),
'Power Failure': np.int64(3),
'Random Failures': np.int64(4),
'Tool Wear Failure': np.int64(5)}
df = df.drop('Failure Type', axis=1)
df
| UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | Failure Type Encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | M14860 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | 1551.000000 | NaN | NaN | NaN | NaN | 1 |
| 1 | 2 | L47181 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | 1479.500000 | 101.116270 | 10224.500000 | 3.5 | 0.1 | 1 |
| 2 | 3 | L47182 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | 1485.666667 | 72.293384 | 5226.333333 | 3.1 | -0.2 | 1 |
| 3 | 4 | L47183 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | 1472.500000 | 64.634872 | 4177.666667 | -9.9 | 0.1 | 1 |
| 4 | 5 | L47184 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | 1459.600000 | 62.970628 | 3965.300000 | 0.5 | 0.1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | 9996 | M24855 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | 1583.200000 | 131.944264 | 17409.288889 | 1.6 | 0.1 | 1 |
| 9996 | 9997 | H39410 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | 1595.700000 | 129.827278 | 16855.122222 | 2.3 | 0.0 | 1 |
| 9997 | 9998 | M24857 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | 1610.200000 | 125.991887 | 15873.955556 | 1.6 | 0.2 | 1 |
| 9998 | 9999 | H39412 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | 1573.900000 | 126.805582 | 16079.655556 | 15.1 | 0.1 | 1 |
| 9999 | 10000 | M24859 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | 1566.200000 | 128.916683 | 16619.511111 | -8.3 | 0.0 | 1 |
10000 rows × 15 columns
df = df.drop(columns=['UDI', 'Product ID'])
df
| Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | Failure Type Encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | 1551.000000 | NaN | NaN | NaN | NaN | 1 |
| 1 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | 1479.500000 | 101.116270 | 10224.500000 | 3.5 | 0.1 | 1 |
| 2 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | 1485.666667 | 72.293384 | 5226.333333 | 3.1 | -0.2 | 1 |
| 3 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | 1472.500000 | 64.634872 | 4177.666667 | -9.9 | 0.1 | 1 |
| 4 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | 1459.600000 | 62.970628 | 3965.300000 | 0.5 | 0.1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | 1583.200000 | 131.944264 | 17409.288889 | 1.6 | 0.1 | 1 |
| 9996 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | 1595.700000 | 129.827278 | 16855.122222 | 2.3 | 0.0 | 1 |
| 9997 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | 1610.200000 | 125.991887 | 15873.955556 | 1.6 | 0.2 | 1 |
| 9998 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | 1573.900000 | 126.805582 | 16079.655556 | 15.1 | 0.1 | 1 |
| 9999 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | 1566.200000 | 128.916683 | 16619.511111 | -8.3 | 0.0 | 1 |
10000 rows × 13 columns
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Quality Type Encoded'] = le.fit_transform(df['Type'])
df_1 = df.copy()
df
| Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | Failure Type Encoded | Quality Type Encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | 1551.000000 | NaN | NaN | NaN | NaN | 1 | 2 |
| 1 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | 1479.500000 | 101.116270 | 10224.500000 | 3.5 | 0.1 | 1 | 1 |
| 2 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | 1485.666667 | 72.293384 | 5226.333333 | 3.1 | -0.2 | 1 | 1 |
| 3 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | 1472.500000 | 64.634872 | 4177.666667 | -9.9 | 0.1 | 1 | 1 |
| 4 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | 1459.600000 | 62.970628 | 3965.300000 | 0.5 | 0.1 | 1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | 1583.200000 | 131.944264 | 17409.288889 | 1.6 | 0.1 | 1 | 2 |
| 9996 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | 1595.700000 | 129.827278 | 16855.122222 | 2.3 | 0.0 | 1 | 0 |
| 9997 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | 1610.200000 | 125.991887 | 15873.955556 | 1.6 | 0.2 | 1 | 2 |
| 9998 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | 1573.900000 | 126.805582 | 16079.655556 | 15.1 | 0.1 | 1 | 0 |
| 9999 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | 1566.200000 | 128.916683 | 16619.511111 | -8.3 | 0.0 | 1 | 2 |
10000 rows × 14 columns
df = df.drop('Type', axis =1)
df
| Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | Failure Type Encoded | Quality Type Encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | 1551.000000 | NaN | NaN | NaN | NaN | 1 | 2 |
| 1 | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | 1479.500000 | 101.116270 | 10224.500000 | 3.5 | 0.1 | 1 | 1 |
| 2 | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | 1485.666667 | 72.293384 | 5226.333333 | 3.1 | -0.2 | 1 | 1 |
| 3 | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | 1472.500000 | 64.634872 | 4177.666667 | -9.9 | 0.1 | 1 | 1 |
| 4 | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | 1459.600000 | 62.970628 | 3965.300000 | 0.5 | 0.1 | 1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | 1583.200000 | 131.944264 | 17409.288889 | 1.6 | 0.1 | 1 | 2 |
| 9996 | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | 1595.700000 | 129.827278 | 16855.122222 | 2.3 | 0.0 | 1 | 0 |
| 9997 | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | 1610.200000 | 125.991887 | 15873.955556 | 1.6 | 0.2 | 1 | 2 |
| 9998 | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | 1573.900000 | 126.805582 | 16079.655556 | 15.1 | 0.1 | 1 | 0 |
| 9999 | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | 1566.200000 | 128.916683 | 16619.511111 | -8.3 | 0.0 | 1 | 2 |
10000 rows × 13 columns
df = df.drop(index=0).reset_index(drop=True)
df = df.drop('Failure Type Encoded', axis =1)
df
| Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | Quality Type Encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | 1479.500000 | 101.116270 | 10224.500000 | 3.5 | 0.1 | 1 |
| 1 | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | 1485.666667 | 72.293384 | 5226.333333 | 3.1 | -0.2 | 1 |
| 2 | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | 1472.500000 | 64.634872 | 4177.666667 | -9.9 | 0.1 | 1 |
| 3 | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | 1459.600000 | 62.970628 | 3965.300000 | 0.5 | 0.1 | 1 |
| 4 | 298.1 | 308.6 | 1425 | 41.9 | 11 | 0 | 1453.833333 | 58.066915 | 3371.766667 | 1.9 | -0.1 | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9994 | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | 1583.200000 | 131.944264 | 17409.288889 | 1.6 | 0.1 | 2 |
| 9995 | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | 1595.700000 | 129.827278 | 16855.122222 | 2.3 | 0.0 | 0 |
| 9996 | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | 1610.200000 | 125.991887 | 15873.955556 | 1.6 | 0.2 | 2 |
| 9997 | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | 1573.900000 | 126.805582 | 16079.655556 | 15.1 | 0.1 | 0 |
| 9998 | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | 1566.200000 | 128.916683 | 16619.511111 | -8.3 | 0.0 | 2 |
9999 rows × 12 columns
from sklearn.model_selection import train_test_split
X = df.drop(columns=['Target'])
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)
from xgboost import XGBClassifier
y_train = y_train.astype(int)
y_test = y_test.astype(int)
model = XGBClassifier(n_estimators=100, learning_rate=0.05)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
y_pred = model.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
Confusion Matrix:
[[1930 6]
[ 27 37]]
Classification Report:
precision recall f1-score support
0 0.99 1.00 0.99 1936
1 0.86 0.58 0.69 64
accuracy 0.98 2000
macro avg 0.92 0.79 0.84 2000
weighted avg 0.98 0.98 0.98 2000
Accuracy Score: 0.9835
Here we are getting good score so lets give some input data to how it will perform on unseen data
import numpy as np
custom_data = np.array([[298.2, 202.3, 3200,35.0,2,1300.500000, 300,5222,-0.1,1,1]])
custom_pred = model.predict(custom_data)
print("Predicted Failure Type:", custom_pred[0])
Predicted Failure Type: 0
Lets see another model to do the same
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
X = df.drop(columns=['Target'])
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)
y_train = y_train.astype(int)
y_test = y_test.astype(int)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
Confusion Matrix:
[[2416 3]
[ 42 39]]
Classification Report:
precision recall f1-score support
0 0.98 1.00 0.99 2419
1 0.93 0.48 0.63 81
accuracy 0.98 2500
macro avg 0.96 0.74 0.81 2500
weighted avg 0.98 0.98 0.98 2500
Accuracy Score: 0.982
As we can we see successfully trained two machine learning models with scores [ 0.9835, 0.982] and but our ultimate target is to find the failure type also so we will train one more model if first model predict failure then we will predict what type of failure actually it is¶
df_1
| Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | Failure Type Encoded | Quality Type Encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | M | 298.1 | 308.6 | 1551 | 42.8 | 0 | 0 | 1551.000000 | NaN | NaN | NaN | NaN | 1 | 2 |
| 1 | L | 298.2 | 308.7 | 1408 | 46.3 | 3 | 0 | 1479.500000 | 101.116270 | 10224.500000 | 3.5 | 0.1 | 1 | 1 |
| 2 | L | 298.1 | 308.5 | 1498 | 49.4 | 5 | 0 | 1485.666667 | 72.293384 | 5226.333333 | 3.1 | -0.2 | 1 | 1 |
| 3 | L | 298.2 | 308.6 | 1433 | 39.5 | 7 | 0 | 1472.500000 | 64.634872 | 4177.666667 | -9.9 | 0.1 | 1 | 1 |
| 4 | L | 298.2 | 308.7 | 1408 | 40.0 | 9 | 0 | 1459.600000 | 62.970628 | 3965.300000 | 0.5 | 0.1 | 1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9995 | M | 298.8 | 308.4 | 1604 | 29.5 | 14 | 0 | 1583.200000 | 131.944264 | 17409.288889 | 1.6 | 0.1 | 1 | 2 |
| 9996 | H | 298.9 | 308.4 | 1632 | 31.8 | 17 | 0 | 1595.700000 | 129.827278 | 16855.122222 | 2.3 | 0.0 | 1 | 0 |
| 9997 | M | 299.0 | 308.6 | 1645 | 33.4 | 22 | 0 | 1610.200000 | 125.991887 | 15873.955556 | 1.6 | 0.2 | 1 | 2 |
| 9998 | H | 299.0 | 308.7 | 1408 | 48.5 | 25 | 0 | 1573.900000 | 126.805582 | 16079.655556 | 15.1 | 0.1 | 1 | 0 |
| 9999 | M | 299.0 | 308.7 | 1500 | 40.2 | 30 | 0 | 1566.200000 | 128.916683 | 16619.511111 | -8.3 | 0.0 | 1 | 2 |
10000 rows × 14 columns
df_1 = df_1.drop(index=0).reset_index(drop=True)
df_1['Failure Type Encoded'].unique()
array([1, 3, 5, 2, 4, 0])
df_1.sample(30)
| Type | Air temperature [K] | Process temperature [K] | Rotational speed [rpm] | Torque [Nm] | Tool wear [min] | Target | rolling_mean_rpm | rolling_std_rpm | rolling_var_rpm | torque_trend | temperature_trend | Failure Type Encoded | Quality Type Encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2146 | L | 299.4 | 308.9 | 1550 | 33.7 | 174 | 0 | 1516.1 | 115.291370 | 13292.100000 | 0.8 | 0.1 | 1 | 1 |
| 923 | L | 295.5 | 306.0 | 1800 | 27.6 | 208 | 0 | 1518.4 | 122.254925 | 14946.266667 | -17.7 | 0.0 | 1 | 1 |
| 2825 | M | 300.3 | 309.4 | 1612 | 33.1 | 150 | 0 | 1480.0 | 126.455965 | 15991.111111 | -0.4 | 0.0 | 1 | 2 |
| 9347 | L | 298.2 | 308.7 | 1436 | 49.1 | 26 | 0 | 1504.1 | 116.846956 | 13653.211111 | 6.5 | 0.0 | 1 | 1 |
| 7310 | H | 300.0 | 310.5 | 1558 | 36.7 | 137 | 0 | 1479.5 | 142.797642 | 20391.166667 | 3.2 | 0.0 | 1 | 0 |
| 2620 | M | 299.5 | 309.3 | 1506 | 41.5 | 77 | 0 | 1611.1 | 311.317077 | 96918.322222 | 5.8 | 0.1 | 1 | 2 |
| 7940 | M | 300.7 | 311.7 | 1499 | 38.9 | 3 | 0 | 1558.0 | 160.645918 | 25807.111111 | -1.1 | 0.0 | 1 | 2 |
| 9612 | L | 299.0 | 310.2 | 1377 | 62.5 | 92 | 1 | 1501.5 | 88.390862 | 7812.944444 | 30.7 | 0.0 | 3 | 1 |
| 7527 | M | 300.1 | 311.3 | 1748 | 26.6 | 42 | 0 | 1557.2 | 208.548316 | 43492.400000 | -18.6 | 0.0 | 1 | 2 |
| 1235 | H | 297.1 | 308.4 | 1359 | 46.6 | 176 | 0 | 1611.0 | 338.146582 | 114343.111111 | 15.2 | 0.0 | 1 | 0 |
| 2763 | M | 299.9 | 309.3 | 1389 | 52.8 | 0 | 0 | 1465.1 | 125.725848 | 15806.988889 | 5.3 | 0.1 | 1 | 2 |
| 4060 | L | 301.9 | 310.8 | 1906 | 21.7 | 62 | 0 | 1646.0 | 293.130612 | 85925.555556 | -33.8 | -0.1 | 1 | 1 |
| 4659 | L | 303.2 | 311.2 | 1439 | 43.9 | 30 | 0 | 1562.0 | 144.936768 | 21006.666667 | -0.6 | 0.0 | 1 | 1 |
| 2611 | L | 299.4 | 309.1 | 2421 | 14.2 | 57 | 0 | 1620.8 | 330.595355 | 109293.288889 | -7.9 | 0.0 | 1 | 1 |
| 7714 | L | 300.5 | 311.5 | 1302 | 49.6 | 80 | 0 | 1560.0 | 235.973162 | 55683.333333 | 6.6 | 0.0 | 1 | 1 |
| 4534 | L | 302.4 | 310.2 | 1503 | 36.2 | 166 | 0 | 1492.6 | 84.128473 | 7077.600000 | 8.2 | 0.0 | 1 | 1 |
| 4746 | L | 303.3 | 311.2 | 1763 | 27.3 | 27 | 0 | 1550.6 | 231.892791 | 53774.266667 | -18.3 | 0.0 | 1 | 1 |
| 8523 | L | 298.3 | 309.4 | 1468 | 46.2 | 10 | 0 | 1508.7 | 94.041421 | 8843.788889 | 4.5 | 0.1 | 1 | 1 |
| 7124 | L | 300.7 | 310.1 | 1261 | 56.6 | 98 | 0 | 1627.0 | 296.414500 | 87861.555556 | 32.1 | -0.2 | 1 | 1 |
| 6137 | M | 300.8 | 310.7 | 1452 | 43.0 | 139 | 0 | 1567.1 | 181.970358 | 33113.211111 | 23.3 | -0.1 | 1 | 2 |
| 2247 | M | 299.3 | 308.5 | 1304 | 60.6 | 8 | 0 | 1476.7 | 128.189140 | 16432.455556 | 12.1 | 0.1 | 1 | 2 |
| 8093 | H | 300.2 | 311.6 | 1662 | 28.8 | 162 | 0 | 1524.3 | 179.168481 | 32101.344444 | -6.9 | 0.0 | 1 | 0 |
| 4375 | M | 301.9 | 309.6 | 1551 | 34.6 | 211 | 0 | 1512.2 | 123.878794 | 15345.955556 | -1.7 | -0.1 | 1 | 2 |
| 2973 | M | 300.6 | 309.4 | 1521 | 37.2 | 90 | 0 | 1489.6 | 147.501563 | 21756.711111 | -8.8 | 0.1 | 1 | 2 |
| 4741 | L | 303.3 | 311.3 | 1592 | 33.7 | 14 | 0 | 1542.3 | 217.595981 | 47348.011111 | -19.2 | 0.0 | 1 | 1 |
| 4803 | L | 303.7 | 312.6 | 1621 | 38.8 | 182 | 0 | 1465.2 | 127.659965 | 16297.066667 | 7.4 | 0.1 | 1 | 1 |
| 1697 | L | 297.9 | 307.6 | 1481 | 38.0 | 40 | 0 | 1572.2 | 98.111954 | 9625.955556 | -6.0 | 0.0 | 1 | 1 |
| 8725 | L | 297.2 | 308.6 | 1562 | 33.0 | 84 | 0 | 1519.1 | 81.147944 | 6584.988889 | -9.1 | 0.1 | 1 | 1 |
| 9067 | H | 297.1 | 308.2 | 1790 | 30.3 | 128 | 0 | 1692.8 | 252.602366 | 63807.955556 | -26.3 | 0.0 | 1 | 0 |
| 5410 | L | 302.8 | 312.6 | 1462 | 39.4 | 22 | 0 | 1539.9 | 206.189476 | 42514.100000 | -12.1 | 0.1 | 1 | 1 |
df_1 = df_1.drop('Type', axis =1)
df_1 = df_1.drop('Target', axis=1)
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
X = df_1.drop(columns=['Failure Type Encoded'])
y = df_1['Failure Type Encoded']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)
y_train = y_train.astype(int)
y_test = y_test.astype(int)
failure_type_encoded_model = RandomForestClassifier(n_estimators=100, random_state=42)
failure_type_encoded_model.fit(X_train, y_train)
y_pred = failure_type_encoded_model.predict(X_test)
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
Confusion Matrix:
[[ 10 15 0 0 0 0]
[ 1 2415 0 0 0 0]
[ 0 13 7 1 0 0]
[ 0 5 0 16 0 0]
[ 0 4 0 0 0 0]
[ 0 12 1 0 0 0]]
Classification Report:
precision recall f1-score support
0 0.91 0.40 0.56 25
1 0.98 1.00 0.99 2416
2 0.88 0.33 0.48 21
3 0.94 0.76 0.84 21
4 0.00 0.00 0.00 4
5 0.00 0.00 0.00 13
accuracy 0.98 2500
macro avg 0.62 0.42 0.48 2500
weighted avg 0.97 0.98 0.97 2500
Accuracy Score: 0.9792
import numpy as np
custom_data = np.array([[302.0,309.9, 38,57.6,197,1527.6,175.857392,30925.822222,30.4, 0.0,0]])
target_pred = model.predict(custom_data)
print("Predicted Target (Failure or Not):", target_pred[0])
if target_pred[0] == 1:
failure_type_pred = failure_type_encoded_model.predict(custom_data)
failure_types_encoded = {
0: "Heat Dissipation Failure",
1: "No Failure",
2: "Overstrain Failure",
3: "Power Failure",
4: "Random Failures",
5: "Tool Wear Failure",
}
print("Predicted Failure Type:", failure_types_encoded.get(failure_type_pred[0], "Unknown Failure Type"))
else:
print("No failure detected.")
Predicted Target (Failure or Not): 1 Predicted Failure Type: Heat Dissipation Failure
import numpy as np
custom_data = np.array([[300.3,309.9,1394,46.7,210,1492.4,72.9216,5317.600000,-5.4, 0.0,0]])
target_pred = model.predict(custom_data)
print("Predicted Target (Failure or Not):", target_pred[0])
if target_pred[0] == 1:
failure_type_pred = failure_type_encoded_model.predict(custom_data)
failure_types_encoded = {
0: "Heat Dissipation Failure",
1: "No Failure",
2: "Overstrain Failure",
3: "Power Failure",
4: "Random Failures",
5: "Tool Wear Failure",
}
print("Predicted Failure Type:", failure_types_encoded.get(failure_type_pred[0], "Unknown Failure Type"))
else:
print("No failure detected.")
Predicted Target (Failure or Not): 1 Predicted Failure Type: Tool Wear Failure
Real-Time Predictive Maintenance System: How It Works¶
In a real-time predictive maintenance system, the model continuously receives input from the sensors on the equipment (like pumps), processes the data, and compares it against the patterns it has learned during training. Here's a step-by-step breakdown of how it works:
1. Real-Time Data Collection:¶
- Sensors on the equipment (e.g., centrifugal pumps) collect real-time operational data like rotational speed (RPM), temperature, torque, vibration, and other relevant parameters.
- This data is continuously transmitted to a central system or cloud platform via an IoT network.
2. Input to the Model:¶
- Preprocessing: The raw sensor data might be preprocessed (for example, by calculating rolling means, standard deviations, or trends, as discussed earlier).
- This preprocessed data becomes the input to the machine learning model. Every time new data is collected, it serves as fresh input to the model for analysis.
3. Model Comparison & Prediction:¶
The trained machine learning model continuously compares the incoming real-time data to the patterns and trends it learned during the training phase.
The model checks whether the current values (e.g., RPM, temperature, torque) match those associated with normal operation or indicate signs of impending failure.
For example:
- If the RPM deviates from the expected rolling mean, it might signal that the pump is operating inefficiently, which could lead to failure.
- If the temperature is rising unusually or fluctuating, it could indicate overheating or a malfunction.
4. Failure Prediction:¶
- Based on the comparison, the model makes a real-time prediction:
- "No Failure": If the system detects that the equipment is operating normally.
- "Failure Predicted": If the model detects signs that suggest a potential failure within a specific time window (e.g., 24 hours, 48 hours).
- The model uses its learned thresholds (like high RPM, high torque, etc.) or patterns to determine whether an alert should be triggered.
5. Alert/Action:¶
- If the model predicts a failure or detects anomalies, it alerts operators or triggers a maintenance action. The system could issue an alert like:
- "Warning: Torque is higher than expected, indicating a potential blockage or resistance."
- "Warning: Temperature is increasing rapidly, indicating overheating."
- Operators or maintenance teams can then take action to prevent failure, such as adjusting the pump, performing a quick inspection, or scheduling downtime for repairs.
6. Continuous Monitoring:¶
- This process happens continuously, with the system constantly comparing the latest data to the model's predictions, ensuring that the equipment is always being monitored for potential issues.
- The model is always "on," updating its predictions as new data comes in.